Overview

Dataset statistics

Number of variables12
Number of observations726
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory68.2 KiB
Average record size in memory96.2 B

Variable types

NUM9
CAT2
DATE1

Reproduction

Analysis started2020-06-04 13:49:11.581335
Analysis finished2020-06-04 13:49:22.867683
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
20PDT001 is highly correlated with 20FT001 and 1 other fieldsHigh Correlation
20FT001 is highly correlated with 20PDT001 and 1 other fieldsHigh Correlation
50TT002 is highly correlated with 20FT001 and 3 other fieldsHigh Correlation
50FT001 is highly correlated with 50TT002 and 2 other fieldsHigh Correlation
50PDT001 is highly correlated with 50FT001 and 1 other fieldsHigh Correlation
50TV001 is highly correlated with 50FT001 and 2 other fieldsHigh Correlation

Variables

Date
Date

UNIFORM
UNIQUE
Distinct count726
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.8 KiB
Minimum2020-01-01 12:00:00
Maximum2020-06-30 18:00:00
Histogram

20TT001
Real number (ℝ≥0)

Distinct count172
Unique (%)23.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.97272727272727
Minimum74.4
Maximum97.1
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum74.4
5-th percentile79.6
Q184.7
median89.4
Q391.7
95-th percentile95.2
Maximum97.1
Range22.7
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.832504883
Coefficient of variation (CV)0.05493185256
Kurtosis-0.609557909
Mean87.97272727
Median Absolute Deviation (MAD)4.141172051
Skewness-0.3975460912
Sum63868.2
Variance23.35310345
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[74.4 79.05 80.75 82.45 84.55 ... 89.75 90.55 93.25 96.25 97.1 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
90 17 2.3%
 
89.9 17 2.3%
 
90.3 14 1.9%
 
84.9 14 1.9%
 
90.5 12 1.7%
 
85.2 11 1.5%
 
90.2 11 1.5%
 
85 11 1.5%
 
84.7 10 1.4%
 
84.8 10 1.4%
 
Other values (162) 599 82.5%
 
ValueCountFrequency (%) 
74.4 2 0.3%
 
74.6 2 0.3%
 
75.1 1 0.1%
 
75.3 1 0.1%
 
75.4 2 0.3%
 
ValueCountFrequency (%) 
97.1 1 0.1%
 
96.6 1 0.1%
 
96.5 1 0.1%
 
96.3 1 0.1%
 
96.2 3 0.4%
 

20PT001
Categorical

CONSTANT
REJECTED
Distinct count1
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.8 KiB
40
726
ValueCountFrequency (%) 
40 726 100.0%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

20FT001
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE
Distinct count726
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54906.15308539945
Minimum29005.2
Maximum71003.6
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum29005.2
5-th percentile38635.7075
Q146989.555
median58916.75
Q361780.3975
95-th percentile65672.4475
Maximum71003.6
Range41998.4
Interquartile range (IQR)14790.8425

Descriptive statistics

Standard deviation9399.66533
Coefficient of variation (CV)0.1711951175
Kurtosis-0.551575726
Mean54906.15309
Median Absolute Deviation (MAD)8086.173096
Skewness-0.6824490775
Sum39861867.14
Variance88353708.31
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[29005.2 30747.345 37797.615 40770.16 43511.46 ... 58625.245 64211.635 65983.5 69786.9 71003.6 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
60354.57 1 0.1%
 
56816.95 1 0.1%
 
60718.21 1 0.1%
 
58613.52 1 0.1%
 
60224.93 1 0.1%
 
46007.55 1 0.1%
 
62130.22 1 0.1%
 
70259.06 1 0.1%
 
60033.12 1 0.1%
 
46454.02 1 0.1%
 
Other values (716) 716 98.6%
 
ValueCountFrequency (%) 
29005.2 1 0.1%
 
29342.98 1 0.1%
 
29381.37 1 0.1%
 
29516.16 1 0.1%
 
30244.88 1 0.1%
 
ValueCountFrequency (%) 
71003.6 1 0.1%
 
70795.97 1 0.1%
 
70754.34 1 0.1%
 
70545.58 1 0.1%
 
70532.54 1 0.1%
 

20TT002
Real number (ℝ≥0)

Distinct count8
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25.007438016528926
Minimum23.9
Maximum27.6
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum23.9
5-th percentile24.9
Q125
median25
Q325.1
95-th percentile25.1
Maximum27.6
Range3.7
Interquartile range (IQR)0.1

Descriptive statistics

Standard deviation0.1314105686
Coefficient of variation (CV)0.005254859314
Kurtosis215.6755645
Mean25.00743802
Median Absolute Deviation (MAD)0.06203811215
Skewness9.988584451
Sum18155.4
Variance0.01726873753
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[23.9 24.4 24.95 25.05 25.2 27.6 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
25 351 48.3%
 
25.1 193 26.6%
 
24.9 175 24.1%
 
25.4 2 0.3%
 
25.5 2 0.3%
 
27.6 1 0.1%
 
25.3 1 0.1%
 
23.9 1 0.1%
 
ValueCountFrequency (%) 
23.9 1 0.1%
 
24.9 175 24.1%
 
25 351 48.3%
 
25.1 193 26.6%
 
25.3 1 0.1%
 
ValueCountFrequency (%) 
27.6 1 0.1%
 
25.5 2 0.3%
 
25.4 2 0.3%
 
25.3 1 0.1%
 
25.1 193 26.6%
 

20PDT001
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count259
Unique (%)35.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.32289807162534434
Minimum0.1
Maximum0.499
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum0.1
5-th percentile0.169
Q10.24625
median0.3545
Q30.39
95-th percentile0.43075
Maximum0.499
Range0.399
Interquartile range (IQR)0.14375

Descriptive statistics

Standard deviation0.09079862111
Coefficient of variation (CV)0.2811990194
Kurtosis-0.768624015
Mean0.3228980716
Median Absolute Deviation (MAD)0.07788967056
Skewness-0.4719858403
Sum234.424
Variance0.008244389596
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.1 0.111 0.1625 0.1955 0.2125 ... 0.3165 0.3605 0.3945 0.4255 0.499 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.378 9 1.2%
 
0.372 9 1.2%
 
0.375 8 1.1%
 
0.381 8 1.1%
 
0.39 8 1.1%
 
0.367 8 1.1%
 
0.373 7 1.0%
 
0.412 7 1.0%
 
0.387 7 1.0%
 
0.424 7 1.0%
 
Other values (249) 648 89.3%
 
ValueCountFrequency (%) 
0.1 1 0.1%
 
0.101 1 0.1%
 
0.102 1 0.1%
 
0.103 1 0.1%
 
0.108 1 0.1%
 
ValueCountFrequency (%) 
0.499 1 0.1%
 
0.495 1 0.1%
 
0.494 1 0.1%
 
0.491 2 0.3%
 
0.49 1 0.1%
 

50TT001
Real number (ℝ≥0)

Distinct count23
Unique (%)3.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.54297520661157
Minimum9.5
Maximum11.7
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum9.5
5-th percentile9.6
Q19.9
median10.4
Q311.3
95-th percentile11.7
Maximum11.7
Range2.2
Interquartile range (IQR)1.4

Descriptive statistics

Standard deviation0.7494358095
Coefficient of variation (CV)0.07108390134
Kurtosis-1.360295495
Mean10.54297521
Median Absolute Deviation (MAD)0.6690799809
Skewness0.3053668423
Sum7654.2
Variance0.5616540325
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 9.5 9.55 9.95 10.05 10.15 11.65 11.7 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11.7 104 14.3%
 
10 79 10.9%
 
9.8 40 5.5%
 
9.7 40 5.5%
 
9.9 40 5.5%
 
10.1 40 5.5%
 
9.6 40 5.5%
 
9.5 36 5.0%
 
10.2 27 3.7%
 
10.5 20 2.8%
 
Other values (13) 260 35.8%
 
ValueCountFrequency (%) 
9.5 36 5.0%
 
9.6 40 5.5%
 
9.7 40 5.5%
 
9.8 40 5.5%
 
9.9 40 5.5%
 
ValueCountFrequency (%) 
11.7 104 14.3%
 
11.6 20 2.8%
 
11.5 20 2.8%
 
11.4 20 2.8%
 
11.3 20 2.8%
 

50PT001
Categorical

CONSTANT
REJECTED
Distinct count1
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.8 KiB
5
726
ValueCountFrequency (%) 
5 726 100.0%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 1 100.0%
 
ValueCountFrequency (%) 
Common 1 100.0%
 
ValueCountFrequency (%) 
ASCII 1 100.0%
 

50FT001
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count604
Unique (%)83.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean277033.1855647383
Minimum70705.19
Maximum600000.0
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum70705.19
5-th percentile100674.2125
Q1184674.5125
median261029.755
Q3345044.515
95-th percentile523904.2375
Maximum600000
Range529294.81
Interquartile range (IQR)160370.0025

Descriptive statistics

Standard deviation122273.2714
Coefficient of variation (CV)0.4413668751
Kurtosis-0.2220162359
Mean277033.1856
Median Absolute Deviation (MAD)97388.98827
Skewness0.5965705302
Sum201126092.7
Variance1.49507529e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 70705.19 203558.175 359988.95 562657. 600000. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
340285.29 5 0.7%
 
145099.15 5 0.7%
 
366171.29 4 0.6%
 
214600.62 4 0.6%
 
246694.71 4 0.6%
 
216564.67 4 0.6%
 
600000 4 0.6%
 
506534.57 3 0.4%
 
559279.88 3 0.4%
 
246808.29 3 0.4%
 
Other values (594) 687 94.6%
 
ValueCountFrequency (%) 
70705.19 1 0.1%
 
75175.46 2 0.3%
 
77038.36 1 0.1%
 
77460.76 1 0.1%
 
78117.8 1 0.1%
 
ValueCountFrequency (%) 
600000 4 0.6%
 
580239.84 2 0.3%
 
575859.43 1 0.1%
 
574206.43 1 0.1%
 
566034.12 1 0.1%
 

50TT002
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count100
Unique (%)13.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.16418732782369
Minimum16.1
Maximum27.1
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum16.1
5-th percentile16.9
Q118.3
median19.7
Q321.575
95-th percentile24.9
Maximum27.1
Range11
Interquartile range (IQR)3.275

Descriptive statistics

Standard deviation2.451872998
Coefficient of variation (CV)0.1215954285
Kurtosis-0.2006415271
Mean20.16418733
Median Absolute Deviation (MAD)1.959665779
Skewness0.6787935567
Sum14639.2
Variance6.011681201
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[16.1 16.75 18.05 20.75 23.25 24.25 25.15 27.1 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
19.3 24 3.3%
 
18.3 22 3.0%
 
20.5 20 2.8%
 
19.4 19 2.6%
 
19.1 18 2.5%
 
19.6 17 2.3%
 
20.6 17 2.3%
 
19.8 16 2.2%
 
20 16 2.2%
 
18.5 14 1.9%
 
Other values (90) 543 74.8%
 
ValueCountFrequency (%) 
16.1 1 0.1%
 
16.2 5 0.7%
 
16.3 3 0.4%
 
16.4 2 0.3%
 
16.5 5 0.7%
 
ValueCountFrequency (%) 
27.1 2 0.3%
 
27 1 0.1%
 
26.9 1 0.1%
 
26.8 2 0.3%
 
26.6 4 0.6%
 

50PDT001
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count371
Unique (%)51.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3113801652892562
Minimum0.036000000000000004
Maximum1.11
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum0.036
5-th percentile0.06
Q10.13
median0.245
Q30.398
95-th percentile0.8615
Maximum1.11
Range1.074
Interquartile range (IQR)0.268

Descriptive statistics

Standard deviation0.2387666906
Coefficient of variation (CV)0.7668012198
Kurtosis1.183853761
Mean0.3113801653
Median Absolute Deviation (MAD)0.1808103499
Skewness1.338218457
Sum226.062
Variance0.05700953252
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.036 0.114 0.1255 0.2905 0.3025 0.446 0.9845 1.1085 1.11 ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.181 8 1.1%
 
0.115 8 1.1%
 
0.07 8 1.1%
 
0.123 8 1.1%
 
0.174 7 1.0%
 
0.298 7 1.0%
 
0.221 7 1.0%
 
0.065 6 0.8%
 
0.301 6 0.8%
 
0.2 6 0.8%
 
Other values (361) 655 90.2%
 
ValueCountFrequency (%) 
0.036 2 0.3%
 
0.038 2 0.3%
 
0.039 3 0.4%
 
0.041 1 0.1%
 
0.042 1 0.1%
 
ValueCountFrequency (%) 
1.11 2 0.3%
 
1.109 1 0.1%
 
1.108 1 0.1%
 
1.043 2 0.3%
 
1.027 1 0.1%
 

50TV001
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count84
Unique (%)11.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.46176308539944905
Minimum0.12
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size5.8 KiB

Quantile statistics

Minimum0.12
5-th percentile0.17
Q10.31
median0.435
Q30.5775
95-th percentile0.87
Maximum1
Range0.88
Interquartile range (IQR)0.2675

Descriptive statistics

Standard deviation0.2036018537
Coefficient of variation (CV)0.4409227592
Kurtosis-0.2222411765
Mean0.4617630854
Median Absolute Deviation (MAD)0.1622069683
Skewness0.5986988252
Sum335.24
Variance0.04145371483
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.12 0.335 0.595 0.935 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0.36 24 3.3%
 
0.48 24 3.3%
 
0.37 20 2.8%
 
0.47 20 2.8%
 
0.58 19 2.6%
 
0.49 17 2.3%
 
0.41 17 2.3%
 
0.39 17 2.3%
 
0.57 17 2.3%
 
0.4 17 2.3%
 
Other values (74) 534 73.6%
 
ValueCountFrequency (%) 
0.12 1 0.1%
 
0.13 7 1.0%
 
0.14 8 1.1%
 
0.15 6 0.8%
 
0.16 12 1.7%
 
ValueCountFrequency (%) 
1 4 0.6%
 
0.97 2 0.3%
 
0.96 2 0.3%
 
0.94 1 0.1%
 
0.93 7 1.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

Date20TT00120PT00120FT00120TT00220PDT00150TT00150PT00150FT00150TT00250PDT00150TV001
02020-01-01 12:00:0089.94060898.5925.00.37810.05291262.8419.30.2910.49
12020-01-01 18:00:0090.64060422.4725.00.37410.05301308.5519.10.3090.50
22020-01-02 00:00:0090.04060996.5425.10.38010.05297020.0619.20.3010.50
32020-01-02 06:00:0090.24059710.1125.10.36510.05280091.6419.60.2780.47
42020-01-02 12:00:0089.34059863.0325.10.36610.05280091.6419.50.2790.47
52020-01-02 18:00:0090.04061460.4725.10.38510.15307031.8819.00.3200.51
62020-01-03 00:00:0089.14060643.5724.90.37510.15292820.6319.20.2930.49
72020-01-03 06:00:0087.84060192.4124.90.36910.15278609.3819.30.2760.46
82020-01-03 12:00:0086.14059473.8125.00.36110.15257857.7619.60.2390.43
92020-01-03 18:00:0086.04059414.9425.00.35910.15257857.7619.60.2390.43

Last rows

Date20TT00120PT00120FT00120TT00220PDT00150TT00150PT00150FT00150TT00250PDT00150TV001
7162020-06-28 12:00:0084.74071003.6025.10.49911.75518776.8117.20.8460.86
7172020-06-28 18:00:0085.74067284.0925.00.45211.75459038.8317.70.6740.77
7182020-06-29 00:00:0084.94065419.5425.00.42911.75396038.8318.40.5120.66
7192020-06-29 06:00:0087.74065542.9925.10.43111.75443541.5718.00.6320.74
7202020-06-29 12:00:0088.44066781.6325.00.44711.75489775.7417.60.7590.82
7212020-06-29 18:00:0087.34065889.6124.90.43611.75458275.7417.80.6710.76
7222020-06-30 00:00:0087.64066661.3825.10.44511.75475030.7317.70.7180.79
7232020-06-30 06:00:0088.84064044.8725.00.41411.75439514.2118.10.6210.73
7242020-06-30 12:00:0079.34064200.2125.00.41211.75317542.3218.90.3410.53
7252020-06-30 18:00:0079.14062763.8825.10.39411.75289528.9819.40.2960.48